Graph-Based Reinforcement Learning for Movement Prediction

As part of my master’s thesis in Informatics with a focus on intelligent systems (data science), a novel concept for the autonomous control of swarms of drones was investigated. The aim of the thesis, entitled “Graph-based reinforcement learning for collaborative motion prediction of drones”, was to control several drones in a swarm in such a way that they navigate autonomously to a target point while avoiding collisions.

Problem definition and objective #

The main objective of the project was to develop a model that allows drones, represented as a graph, to navigate autonomously to target points while flying safely by avoiding collisions. Each drone had to be able to predict the movements of other drones and adjust its own trajectory accordingly.

Subproblems: #

Graph representation of the drones: The drones and their environment were modeled as graphs, with each edge representing a connection between the drones based on their relative position and flight direction.
Reinforcement learning (RL): The control of the drones was learned by a RL algorithm that responds to continuous feedback from the environment, learning to avoid collisions.
Prediction of drone movements: To effectively prevent collisions, the model needed to correctly predict the movements of other drones within the swarm.

Analysis of reinforcement learning algorithms #

Two different Reinforcement Learning (RL) algorithms were analyzed:

Deep Q-Learning (DQN): An algorithm that specializes in discrete actions and selects the best actions for the agent using a Q-value function.
Proximal Policy Optimization (PPO): An advanced RL algorithm that is designed for continuous actions and provides a stable learning curve by optimizing the policy.

Both algorithms were combined with a graph-based learning method to model the relations between the drones.

Spline-based convolutional layers in the Graph Neural Network #

A key component of the model was the use of spline-based convolutional layers in the Graph Neural Network (GNN). These layers enabled implicit learning of distances between drones without explicitly entering metric distances into the model. In this way, the drones could effectively process information about their neighborhood and avoid collisions. This was implemented with the Python library PyTorch Geometric.

Evaluation of the RL algorithms #

The two RL algorithms were tested with regard to their ability to guide the drones safely to their destination while avoiding collisions.

Deep Q-Learning (DQN) proved to be unsuitable. It could neither prevent collisions nor successfully guide the drones to their destination.
Proximal Policy Optimization (PPO), on the other hand, showed promising results. PPO made it possible for the drones to reach the target while avoiding collisions.

The implementations were evaluated in an environment developed in-house for this purpose, based on Open AI Gym.

Curriculum Learning #

Another important insight was the introduction of Curriculum Learning into the training process. This method facilitated the training of the RL agent by gradually increasing the complexity of the tasks. By successively increasing the difficulty, PPO was able to learn more efficiently and showed improved results in collision avoidance and target approach.

Conclusion #

The master thesis showed that graph-based reinforcement learning is an approach to control drone swarms. Especially Proximal Policy Optimization in combination with Graph Neural Networks and Spline Convolutional Layers proved to be a suitable method to navigate drones safely and efficiently. By using Curriculum Learning, the training process could be further optimized.

Activities #

Implementation and adaptation of two reinforcement learning algorithms: DQN and PPO
Abstract implementation of a virtual environment for the simulation of drone flights in OpenAI Gym
Evaluation of the learning success of the machine learning methods in several experiments
Documentation and defense of the results